{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "80c2e3fd",
   "metadata": {},
   "source": [
    "# Using AWS S3 to read/write market data with findatapy\n",
    "\n",
    "May 2021 - Saeed Amen - https://www.cuemacro.com - saeed@cuemacro.com"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5774e629",
   "metadata": {},
   "source": [
    "## What is S3?\n",
    "\n",
    "S3 is basically storage in the cloud, which is managed by AWS. Dump as much data as want from anywhere on the web and you don't need to worry about scaling your storage, which you'd obviously have to do in your own data centre, and also manage backups. Data is stored in S3 buckets, which are a bit like folders. Google Cloud Storage (GCS) is the equivalent on Google Cloud and Azure Blob is a similar service on Azure."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "e3604515",
   "metadata": {},
   "source": [
    "## What is the cost of S3?\n",
    "There are many other AWS storage services which you can find at https://aws.amazon.com/products/storage/, which are at different price levels and performance too. It is important to use the right services for storage which have the right performance cost balance for your specific use cases.\n",
    "\n",
    "The cost of S3 depends upon factors like:\n",
    "\n",
    "* how much data you store?\n",
    "* which service you use (are you using S3 Standard Storage for example, or S3 Infrequent Access Storage)?\n",
    "* how many requests you make for the data (PUT/GET etc.)?\n",
    "* which region is it in?\n",
    "* how much data you transfer from S3 out to the internet?\n",
    "\n",
    "An article on cloudhealthtech.com goes through the various ins and outs of the pricing at https://www.cloudhealthtech.com/blog/s3-cost-aws-cloud-storage-costs-explained. They note that the storage cost for Standard S3 (Nov 2020) is around 0.021 to 0.026 USD per month per GB. So for 1 TB that's around 21-26 USD per month, roughly under 300 USD per year. This excludes any of the various request costs for example, which you need to take into account. The actual cost of a hardware is a lot cheaper (a quick browse of 1 TB drives online, suggested a cost of around 50 USD), but if we manage our own hardware, we need to take into hassle of managing it, including stuff like backup, convenience of access etc. The cost of losing data is likely to be significant if we choose to host our data locally."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "edf5d74e",
   "metadata": {},
   "source": [
    "## Making AWS accessible via Python\n",
    "\n",
    "Given that S3 is in the cloud, we need to make sure that AWS services need to be accessible from Python, whether we are running our process in the cloud (which seems preferable to reduce latency) or locally. Whilst we are using Python, S3 is also accessible from many other languages.\n",
    "\n",
    "* Hence, before going through this tutorial, you'll need to go through several steps so AWS services are accessible from your machine\n",
    "    * You'll need to create an IAM user, with appropriate permissions at https://console.aws.amazon.com/iam when you are logged into the AWS Console\n",
    "        * In our case this will to have permissions to use S3\n",
    "        * Get the Access key ID and secret access key for the IAM user\n",
    "        * If you want to make S3 accessible to users outside of your AWS account, I found this explanation at https://stackoverflow.com/questions/45336781/amazon-s3-access-for-other-aws-accounts\n",
    "        * Before changing any access rights to S3, I'd strongly recommend reading https://aws.amazon.com/s3/security/ which explains the various security mechanisms including being able to block any public access at all\n",
    "    * Install AWS CLI\n",
    "        * run `sudo apt install awscli`\n",
    "        * or you can download the zip file\n",
    "        * run `aws configure` to set the default access key ID, default AWS availability zone etc.\n",
    "        * this will create files in ~/.aws/credentials and ~/.aws/config\n",
    "    * AWS CLI instructions at https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2-linux.html#cliv2-linux-install\n",
    "\n",
    "* Once your  credentials are set, we can use boto3, which is an SDK for Python developers to access AWS resources:\n",
    "    * `boto3` instructions https://boto3.amazonaws.com/v1/documentation/api/latest/index.html\n",
    "    * You can install `boto3` using pip\n",
    "    * You also need to install `s3fs` using pip to get access to S3 via Python\n",
    "    * If you follow the instructions at https://github.com/cuemacro/teaching/blob/master/pythoncourse/installation/installing_anaconda_and_pycharm.ipynb - you'll create a conda environment `py38class` which includes boto3, and many useful data science libraries, which I use for my Python teaching\n",
    "    * You may need to install the latest version of findatapy from GitHub using `pip install git+https://github.com/cuemacro/findatapy.git` to run the code below."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "218a505b",
   "metadata": {},
   "source": [
    "## Creating your bucket on S3\n",
    "\n",
    "You can create your S3 bucket using AWS CLI (see https://docs.aws.amazon.com/cli/latest/reference/s3api/create-bucket.html). In the below example we create our bucket called `my-bucket` in the AWS region `us-east-1`\n",
    "\n",
    "`aws s3api create-bucket --bucket my-bucket --region us-east-1`\n",
    "\n",
    "Alternatively, you can also create it via the web GUI at https://s3.console.aws.amazon.com/s3/"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cc8de24b",
   "metadata": {},
   "source": [
    "## Using S3 with findatapy to store tick market data from Dukascopy\n",
    "\n",
    "In this notebook I'm going to show how to use S3 to easily store market data using findatapy. We are assuming that we have already setup AWS CLI with our credentials such as our access key ID etc.\n",
    "\n",
    "As a first step let's download some tick data from Dukascopy for EURUSD spot, which is a free data source using findatapy. Findatapy provides a uniform wrapper to download from many different data sources. We can predefine ticker mappings from our own nicknames for tickers to the vendor tickers. It already comes out of the box, with Dukascopy ticker mappings predefined, but these are all customisable. Note, that we haven't used the `data_engine` property. If this isn't set, then findatapy will download from our data source directly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "d2770075",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-01-26T15:57:50.705402Z",
     "start_time": "2022-01-26T15:57:50.701394Z"
    }
   },
   "outputs": [],
   "source": [
    "# First disable the log so the output is neater\n",
    "import logging, sys\n",
    "logging.disable(sys.maxsize)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "638d0de4",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-01-26T15:58:00.825278Z",
     "start_time": "2022-01-26T15:57:51.353449Z"
    }
   },
   "outputs": [],
   "source": [
    "from findatapy.market import Market, MarketDataRequest\n",
    "\n",
    "# In this case we are saving predefined tick tickers to disk, and then reading back\n",
    "from findatapy.market.ioengine import IOEngine\n",
    "\n",
    "md_request = MarketDataRequest(\n",
    "    start_date='04 Jan 2021',\n",
    "    finish_date='05 Jan 2021',\n",
    "    category='fx',\n",
    "    data_source='dukascopy',\n",
    "    freq='tick',\n",
    "    tickers=['EURUSD'],\n",
    "    fields=['bid', 'ask', 'bidv', 'askv'],\n",
    "    data_engine=None\n",
    ")\n",
    "\n",
    "market = Market()\n",
    "\n",
    "df = market.fetch_market(md_request=md_request)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6f780804",
   "metadata": {},
   "source": [
    "Let's print the output..."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "7db71367",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-01-26T15:58:02.762013Z",
     "start_time": "2022-01-26T15:58:02.738013Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                                  EURUSD.bid  EURUSD.ask  EURUSD.bidv  \\\n",
      "Date                                                                    \n",
      "2021-01-04 00:00:00.401000+00:00     1.22499     1.22503         1.12   \n",
      "2021-01-04 00:00:00.604000+00:00     1.22499     1.22502         0.75   \n",
      "2021-01-04 00:00:00.706000+00:00     1.22496     1.22499         0.75   \n",
      "2021-01-04 00:00:00.807000+00:00     1.22495     1.22498         1.12   \n",
      "2021-01-04 00:00:00.909000+00:00     1.22494     1.22498         0.75   \n",
      "...                                      ...         ...          ...   \n",
      "2021-01-04 23:59:28.697000+00:00     1.22524     1.22527         0.75   \n",
      "2021-01-04 23:59:28.798000+00:00     1.22524     1.22529         2.51   \n",
      "2021-01-04 23:59:29.628000+00:00     1.22525     1.22530         0.75   \n",
      "2021-01-04 23:59:44.365000+00:00     1.22525     1.22529         0.75   \n",
      "2021-01-04 23:59:47.099000+00:00     1.22525     1.22530         0.75   \n",
      "\n",
      "                                  EURUSD.askv  \n",
      "Date                                           \n",
      "2021-01-04 00:00:00.401000+00:00         1.57  \n",
      "2021-01-04 00:00:00.604000+00:00         0.10  \n",
      "2021-01-04 00:00:00.706000+00:00         0.75  \n",
      "2021-01-04 00:00:00.807000+00:00         0.94  \n",
      "2021-01-04 00:00:00.909000+00:00         0.64  \n",
      "...                                       ...  \n",
      "2021-01-04 23:59:28.697000+00:00         4.57  \n",
      "2021-01-04 23:59:28.798000+00:00         3.75  \n",
      "2021-01-04 23:59:29.628000+00:00         4.95  \n",
      "2021-01-04 23:59:44.365000+00:00         3.00  \n",
      "2021-01-04 23:59:47.099000+00:00         2.51  \n",
      "\n",
      "[109520 rows x 4 columns]\n"
     ]
    }
   ],
   "source": [
    "print(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "09949b07",
   "metadata": {},
   "source": [
    "Let's type in our S3 bucket address, which you'll need to change below. Note the use of `s3://` at the start of the expression."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "d673a5bf",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-01-26T15:58:42.105404Z",
     "start_time": "2022-01-26T15:58:42.092412Z"
    }
   },
   "outputs": [],
   "source": [
    "folder = 's3://type-here'"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a2bf7708",
   "metadata": {},
   "source": [
    "We can write our tick data DataFrame in Parquet format. We can give it the `MarketDataRequest` we used for fetching the data, which basically creates the filename in the format of `environment.category.data_source.freq.tickers` for high frequency data or in the format of `environment.category.data_source.freq` for daily data. This will enable us to more easily fetch the data using the same `MarketDataRequest` interface.\n",
    "\n",
    "In this case, the filename of the Parquet file is:\n",
    "\n",
    "* `s3://bla_bla_bla/backtest.fx.tick.dukascopy.NYC.EURUSD.parquet`\n",
    "* ie. the environment of our data is `backtest`\n",
    "* the `category` is `fx`\n",
    "* the `data_source` is `dukascopy`\n",
    "* the `freq` is `tick`\n",
    "* the `cut` (or time of close) is `NYC`\n",
    "* the `tickers` is `EURUSD`\n",
    "\n",
    "The Jupyter notebook [market_data_example.ipynb](../market_data_example.ipynb) explains in more detail this ticker format and the concept of a `MarketDataRequest`. We dump it disk using the `IOEngine` class. Note that the `write_time_series_cache_to_disk` and `read_time_series_from_disk` reads/writes from S3 in exactly the same way as we would do locally. We need to make sure that when we're writing to disk, we have a data licence to do so (and this will clearly vary between data vendors), and in particular, that only those who read from the disk are authorised to use that data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "a60d1b09",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-01-26T15:58:44.132791Z",
     "start_time": "2022-01-26T15:58:43.059459Z"
    }
   },
   "outputs": [],
   "source": [
    "IOEngine().write_time_series_cache_to_disk(folder, df, engine='parquet', md_request=md_request)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9532e6ec",
   "metadata": {},
   "source": [
    "We could fetch the data directly using the S3 filename ie."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "615c9203",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-01-26T15:58:46.009505Z",
     "start_time": "2022-01-26T15:58:45.196807Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                                  EURUSD.bid  EURUSD.ask  EURUSD.bidv  \\\n",
      "Date                                                                    \n",
      "2021-01-04 00:00:00.401000+00:00     1.22499     1.22503         1.12   \n",
      "2021-01-04 00:00:00.604000+00:00     1.22499     1.22502         0.75   \n",
      "2021-01-04 00:00:00.706000+00:00     1.22496     1.22499         0.75   \n",
      "2021-01-04 00:00:00.807000+00:00     1.22495     1.22498         1.12   \n",
      "2021-01-04 00:00:00.909000+00:00     1.22494     1.22498         0.75   \n",
      "...                                      ...         ...          ...   \n",
      "2021-01-04 23:59:28.697000+00:00     1.22524     1.22527         0.75   \n",
      "2021-01-04 23:59:28.798000+00:00     1.22524     1.22529         2.51   \n",
      "2021-01-04 23:59:29.628000+00:00     1.22525     1.22530         0.75   \n",
      "2021-01-04 23:59:44.365000+00:00     1.22525     1.22529         0.75   \n",
      "2021-01-04 23:59:47.099000+00:00     1.22525     1.22530         0.75   \n",
      "\n",
      "                                  EURUSD.askv  \n",
      "Date                                           \n",
      "2021-01-04 00:00:00.401000+00:00         1.57  \n",
      "2021-01-04 00:00:00.604000+00:00         0.10  \n",
      "2021-01-04 00:00:00.706000+00:00         0.75  \n",
      "2021-01-04 00:00:00.807000+00:00         0.94  \n",
      "2021-01-04 00:00:00.909000+00:00         0.64  \n",
      "...                                       ...  \n",
      "2021-01-04 23:59:28.697000+00:00         4.57  \n",
      "2021-01-04 23:59:28.798000+00:00         3.75  \n",
      "2021-01-04 23:59:29.628000+00:00         4.95  \n",
      "2021-01-04 23:59:44.365000+00:00         3.00  \n",
      "2021-01-04 23:59:47.099000+00:00         2.51  \n",
      "\n",
      "[109520 rows x 4 columns]\n"
     ]
    }
   ],
   "source": [
    "s3_filename = folder + '/backtest.fx.dukascopy.tick.NYC.EURUSD.parquet' \n",
    "df = IOEngine().read_time_series_cache_from_disk(s3_filename, engine='parquet')\n",
    "\n",
    "print(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a8b1efc3",
   "metadata": {},
   "source": [
    "But it is more convenient to simply use the `MarketDataRequest` object we populated earlier. But in order to make it fetch from S3 instead of Dukascopy, we just need to set the `data_engine` property to give it the path of the S3 bucket and the postfix `/*.parquet`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "a5035f0e",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-01-26T15:58:49.882886Z",
     "start_time": "2022-01-26T15:58:48.280248Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "                                  EURUSD.bid  EURUSD.ask  EURUSD.bidv  \\\n",
      "Date                                                                    \n",
      "2021-01-04 00:00:00.401000+00:00     1.22499     1.22503         1.12   \n",
      "2021-01-04 00:00:00.604000+00:00     1.22499     1.22502         0.75   \n",
      "2021-01-04 00:00:00.706000+00:00     1.22496     1.22499         0.75   \n",
      "2021-01-04 00:00:00.807000+00:00     1.22495     1.22498         1.12   \n",
      "2021-01-04 00:00:00.909000+00:00     1.22494     1.22498         0.75   \n",
      "...                                      ...         ...          ...   \n",
      "2021-01-04 23:59:28.697000+00:00     1.22524     1.22527         0.75   \n",
      "2021-01-04 23:59:28.798000+00:00     1.22524     1.22529         2.51   \n",
      "2021-01-04 23:59:29.628000+00:00     1.22525     1.22530         0.75   \n",
      "2021-01-04 23:59:44.365000+00:00     1.22525     1.22529         0.75   \n",
      "2021-01-04 23:59:47.099000+00:00     1.22525     1.22530         0.75   \n",
      "\n",
      "                                  EURUSD.askv  \n",
      "Date                                           \n",
      "2021-01-04 00:00:00.401000+00:00         1.57  \n",
      "2021-01-04 00:00:00.604000+00:00         0.10  \n",
      "2021-01-04 00:00:00.706000+00:00         0.75  \n",
      "2021-01-04 00:00:00.807000+00:00         0.94  \n",
      "2021-01-04 00:00:00.909000+00:00         0.64  \n",
      "...                                       ...  \n",
      "2021-01-04 23:59:28.697000+00:00         4.57  \n",
      "2021-01-04 23:59:28.798000+00:00         3.75  \n",
      "2021-01-04 23:59:29.628000+00:00         4.95  \n",
      "2021-01-04 23:59:44.365000+00:00         3.00  \n",
      "2021-01-04 23:59:47.099000+00:00         2.51  \n",
      "\n",
      "[109520 rows x 4 columns]\n"
     ]
    }
   ],
   "source": [
    "md_request.data_engine = folder + '/*.parquet'\n",
    "\n",
    "df = market.fetch_market(md_request)\n",
    "\n",
    "print(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "05e97a48",
   "metadata": {},
   "source": [
    "It should be noted there are many other ways to dump and read Parquet files from S3. We can use `pandas.read_parquet` to directly read Parquet files from S3. Libraries like Dask also support reading Parquet directly from S3 too. I'd also checkout AWS Data Wrangler, which makes it easier to use Pandas with many AWS services including S3, at https://github.com/awslabs/aws-data-wrangler"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "89bb2401",
   "metadata": {},
   "source": [
    "## Using S3 with findatapy to store daily market data from Quandl\n",
    "\n",
    "In this case we are downloading all G10 FX crosses from Quandl, which are predefined as `fx.quandl.daily.NYC` where our \n",
    "* `category` is `fx`\n",
    "* `data_source` is `quandl`\n",
    "* `freq` is `daily`\n",
    "* `cut` is `NYC`\n",
    "\n",
    "Unlike in the previous where we specified the `MarketDataRequest` in full, here we just use the above string as shorthand, and we set the `quandl_api_key` using the `MarketDataRequest` and also the `start_date`. If we have Redis running locally (which is an in memory cache), this DataFrame will also be cached in our Redis instance. We'll show how to take advantage of this cache later. If Redis is not installed, it's not a big deal, it just means the cache won't be operational. Hence, doing repeated `MarketDataRequest` calls will end up taking longer, as findatapy will seek to get the data from the data vendor directly."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "8c76c798",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-01-26T15:58:51.399727Z",
     "start_time": "2022-01-26T15:58:51.383729Z"
    }
   },
   "outputs": [],
   "source": [
    "import os\n",
    "from findatapy.market import Market, MarketDataRequest\n",
    "\n",
    "# In this case we are saving predefined tick tickers to disk, and then reading back\n",
    "from findatapy.market.ioengine import IOEngine\n",
    "\n",
    "# Change this to your own Quandl API key\n",
    "quandl_api_key = os.environ['QUANDL_API_KEY']\n",
    "\n",
    "md_request = market.create_md_request_from_str(md_request_str='fx.quandl.daily.NYC', \n",
    "    md_request=MarketDataRequest(start_date='01 Jan 2021', finish_date='27 May 2021', quandl_api_key=quandl_api_key))"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f2a8f08f",
   "metadata": {},
   "source": [
    "We can print out the `MarketDataRequest` we just constructed. We should be able to see there `quandl` for the `data_source` and `fx` for the `category`, as well as the `start_date` (realise it's difficult!)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "b9ed76d7",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-01-26T15:58:52.346226Z",
     "start_time": "2022-01-26T15:58:52.329225Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MarketDataRequest summary - MarketDataRequest_778__abstract_curve_key-None__base_depos_currencies-EUR_GBP_AUD_NZD_USD_CAD_CHF_NOK_SEK_JPY__base_depos_tenor-ON_TN_SN_1W_2W_3W_1M_2M_3M_4M_6M_9M_1Y_2Y_3Y_5Y__category-fx__category_key-backtest.fx.quandl.daily.NYC__cut-NYC__data_engine-None__data_source-quandl__environment-backtest__expiry_date-NaT__fields-close__finish_date-2021-05-27 00:00:00__freeform_md_request-{}__freq-daily__freq_mult-1__fx_forwards_tenor-ON_TN_SN_1W_2W_3W_1M_2M_3M_4M_6M_9M_1Y_2Y_3Y_5Y__fx_vol_part-V_25R_10R_25B_10B__fx_vol_tenor-ON_1W_2W_3W_1M_2M_3M_4M_6M_9M_1Y_2Y_3Y_5Y__gran_freq-None__list_threads-1__old_tickers-None__push_to_cache-True__resample-None__resample_how-last__split_request_chunks-0__start_date-2021-01-01 00:00:00__tickers-None__trade_side-trade__vendor_fields-None__vendor_tickers-None_df\n"
     ]
    }
   ],
   "source": [
    "print(md_request)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7d31ce9a",
   "metadata": {},
   "source": [
    "We can now fetch the market data from `quandl`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "2028790f",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-01-26T15:58:54.642478Z",
     "start_time": "2022-01-26T15:58:53.297393Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "            EURUSD.close  GBPUSD.close  AUDUSD.close  NZDUSD.close  \\\n",
      "Date                                                                 \n",
      "2021-01-04        1.2254        1.3551        0.7657        0.7164   \n",
      "2021-01-05        1.2295        1.3620        0.7752        0.7242   \n",
      "2021-01-06        1.2290        1.3593        0.7787        0.7281   \n",
      "2021-01-07        1.2265        1.3551        0.7755        0.7249   \n",
      "2021-01-08        1.2252        1.3583        0.7778        0.7252   \n",
      "...                  ...           ...           ...           ...   \n",
      "2021-05-21        1.2178        1.4158        0.7731        0.7165   \n",
      "2021-05-24        1.2210        1.4147        0.7741        0.7205   \n",
      "2021-05-25        1.2233        1.4123        0.7741        0.7216   \n",
      "2021-05-26        1.2204        1.4129        0.7743        0.7292   \n",
      "2021-05-27        1.2194        1.4172        0.7736        0.7291   \n",
      "\n",
      "            USDCAD.close  USDCHF.close  USDNOK.close  USDSEK.close  \\\n",
      "Date                                                                 \n",
      "2021-01-04        1.2781        0.8811        8.5660        8.2486   \n",
      "2021-01-05        1.2700        0.8784        8.4901        8.1856   \n",
      "2021-01-06        1.2685        0.8810        8.4206        8.1855   \n",
      "2021-01-07        1.2720        0.8841        8.4384        8.1992   \n",
      "2021-01-08        1.2698        0.8843        8.4074        8.2085   \n",
      "...                  ...           ...           ...           ...   \n",
      "2021-05-21        1.2072        0.8987        8.3992        8.3251   \n",
      "2021-05-24        1.2082        0.8973        8.3577        8.3226   \n",
      "2021-05-25        1.2069        0.8971        8.3231        8.2798   \n",
      "2021-05-26        1.2111        0.8973        8.3490        8.3117   \n",
      "2021-05-27        1.2075        0.8977        8.3615        8.2918   \n",
      "\n",
      "            USDJPY.close  \n",
      "Date                      \n",
      "2021-01-04    103.190002  \n",
      "2021-01-05    102.699997  \n",
      "2021-01-06    103.250000  \n",
      "2021-01-07    103.839996  \n",
      "2021-01-08    103.889999  \n",
      "...                  ...  \n",
      "2021-05-21    108.940002  \n",
      "2021-05-24    108.809998  \n",
      "2021-05-25    108.949997  \n",
      "2021-05-26    109.099998  \n",
      "2021-05-27    109.809998  \n",
      "\n",
      "[101 rows x 9 columns]\n"
     ]
    }
   ],
   "source": [
    "df = market.fetch_market(md_request)\n",
    "\n",
    "print(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "80f3c2ff",
   "metadata": {},
   "source": [
    "Let's write this to S3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "ac94b34e",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-01-26T15:58:55.423173Z",
     "start_time": "2022-01-26T15:58:55.208982Z"
    }
   },
   "outputs": [],
   "source": [
    "IOEngine().write_time_series_cache_to_disk(folder, df, engine='parquet', md_request=md_request)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "73064cc9",
   "metadata": {},
   "source": [
    "And we can read it back using a similar call to before, except this time we set the `data_engine` property of the `MarketDataRequest`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "16b8d95c",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-01-26T15:58:56.879643Z",
     "start_time": "2022-01-26T15:58:56.450563Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "            EURUSD.close  GBPUSD.close  AUDUSD.close  NZDUSD.close  \\\n",
      "Date                                                                 \n",
      "2021-01-04        1.2254        1.3551        0.7657        0.7164   \n",
      "2021-01-05        1.2295        1.3620        0.7752        0.7242   \n",
      "2021-01-06        1.2290        1.3593        0.7787        0.7281   \n",
      "2021-01-07        1.2265        1.3551        0.7755        0.7249   \n",
      "2021-01-08        1.2252        1.3583        0.7778        0.7252   \n",
      "...                  ...           ...           ...           ...   \n",
      "2021-05-21        1.2178        1.4158        0.7731        0.7165   \n",
      "2021-05-24        1.2210        1.4147        0.7741        0.7205   \n",
      "2021-05-25        1.2233        1.4123        0.7741        0.7216   \n",
      "2021-05-26        1.2204        1.4129        0.7743        0.7292   \n",
      "2021-05-27        1.2194        1.4172        0.7736        0.7291   \n",
      "\n",
      "            USDCAD.close  USDCHF.close  USDNOK.close  USDSEK.close  \\\n",
      "Date                                                                 \n",
      "2021-01-04        1.2781        0.8811        8.5660        8.2486   \n",
      "2021-01-05        1.2700        0.8784        8.4901        8.1856   \n",
      "2021-01-06        1.2685        0.8810        8.4206        8.1855   \n",
      "2021-01-07        1.2720        0.8841        8.4384        8.1992   \n",
      "2021-01-08        1.2698        0.8843        8.4074        8.2085   \n",
      "...                  ...           ...           ...           ...   \n",
      "2021-05-21        1.2072        0.8987        8.3992        8.3251   \n",
      "2021-05-24        1.2082        0.8973        8.3577        8.3226   \n",
      "2021-05-25        1.2069        0.8971        8.3231        8.2798   \n",
      "2021-05-26        1.2111        0.8973        8.3490        8.3117   \n",
      "2021-05-27        1.2075        0.8977        8.3615        8.2918   \n",
      "\n",
      "            USDJPY.close  \n",
      "Date                      \n",
      "2021-01-04    103.190002  \n",
      "2021-01-05    102.699997  \n",
      "2021-01-06    103.250000  \n",
      "2021-01-07    103.839996  \n",
      "2021-01-08    103.889999  \n",
      "...                  ...  \n",
      "2021-05-21    108.940002  \n",
      "2021-05-24    108.809998  \n",
      "2021-05-25    108.949997  \n",
      "2021-05-26    109.099998  \n",
      "2021-05-27    109.809998  \n",
      "\n",
      "[101 rows x 9 columns]\n"
     ]
    }
   ],
   "source": [
    "df = market.fetch_market(md_request_str='fx.quandl.daily.NYC', \n",
    "                         md_request=MarketDataRequest(start_date='01 Jan 2021', finish_date='27 May 2021',\n",
    "                                                      quandl_api_key=quandl_api_key,\n",
    "                                                     data_engine=folder + '/*.parquet'))\n",
    "\n",
    "print(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "181ae8e8",
   "metadata": {},
   "source": [
    "If we set the `cache_algo` property to `cache_algo_return` and remove the `data_engine` parameter (and if we have had Redis running, and make exactly the same data requet call (same assets and dates), findatapy will look in the Redis cache locally to fetch the data. This is significantly quicker at around 20ms versus over 500ms for fetching from S3."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "d659f6d7",
   "metadata": {
    "ExecuteTime": {
     "end_time": "2022-01-26T15:58:58.336139Z",
     "start_time": "2022-01-26T15:58:58.313136Z"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "            EURUSD.close  GBPUSD.close  AUDUSD.close  NZDUSD.close  \\\n",
      "Date                                                                 \n",
      "2021-01-04        1.2254        1.3551        0.7657        0.7164   \n",
      "2021-01-05        1.2295        1.3620        0.7752        0.7242   \n",
      "2021-01-06        1.2290        1.3593        0.7787        0.7281   \n",
      "2021-01-07        1.2265        1.3551        0.7755        0.7249   \n",
      "2021-01-08        1.2252        1.3583        0.7778        0.7252   \n",
      "...                  ...           ...           ...           ...   \n",
      "2021-05-21        1.2178        1.4158        0.7731        0.7165   \n",
      "2021-05-24        1.2210        1.4147        0.7741        0.7205   \n",
      "2021-05-25        1.2233        1.4123        0.7741        0.7216   \n",
      "2021-05-26        1.2204        1.4129        0.7743        0.7292   \n",
      "2021-05-27        1.2194        1.4172        0.7736        0.7291   \n",
      "\n",
      "            USDCAD.close  USDCHF.close  USDNOK.close  USDSEK.close  \\\n",
      "Date                                                                 \n",
      "2021-01-04        1.2781        0.8811        8.5660        8.2486   \n",
      "2021-01-05        1.2700        0.8784        8.4901        8.1856   \n",
      "2021-01-06        1.2685        0.8810        8.4206        8.1855   \n",
      "2021-01-07        1.2720        0.8841        8.4384        8.1992   \n",
      "2021-01-08        1.2698        0.8843        8.4074        8.2085   \n",
      "...                  ...           ...           ...           ...   \n",
      "2021-05-21        1.2072        0.8987        8.3992        8.3251   \n",
      "2021-05-24        1.2082        0.8973        8.3577        8.3226   \n",
      "2021-05-25        1.2069        0.8971        8.3231        8.2798   \n",
      "2021-05-26        1.2111        0.8973        8.3490        8.3117   \n",
      "2021-05-27        1.2075        0.8977        8.3615        8.2918   \n",
      "\n",
      "            USDJPY.close  \n",
      "Date                      \n",
      "2021-01-04    103.190002  \n",
      "2021-01-05    102.699997  \n",
      "2021-01-06    103.250000  \n",
      "2021-01-07    103.839996  \n",
      "2021-01-08    103.889999  \n",
      "...                  ...  \n",
      "2021-05-21    108.940002  \n",
      "2021-05-24    108.809998  \n",
      "2021-05-25    108.949997  \n",
      "2021-05-26    109.099998  \n",
      "2021-05-27    109.809998  \n",
      "\n",
      "[101 rows x 9 columns]\n"
     ]
    }
   ],
   "source": [
    "df = market.fetch_market(md_request_str='fx.quandl.daily.NYC', \n",
    "                         md_request=MarketDataRequest(start_date='01 Jan 2021', finish_date='27 May 2021',\n",
    "                                                      quandl_api_key=quandl_api_key,\n",
    "                                                      cache_algo='cache_algo_return'))\n",
    "\n",
    "print(df)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "53bc582c",
   "metadata": {},
   "source": [
    "## Conclusion"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "69f30215",
   "metadata": {},
   "source": [
    "We have seen that it's pretty straightforward to store market data as Parquet files in S3 using findatapy. In particular, it makes it easy to use very similar `MarketDataRequest` calls we would use to fetch data from the `data_source` itself, as it is from S3 (or indeed from any local disk drive). We just need to be sure to set the `data_engine` property of the `MarketDataRequest`.\n",
    "\n",
    "We also briefly showed how to take advantage of in memory caching with Redis."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.8.8"
  },
  "toc": {
   "base_numbering": 1,
   "nav_menu": {},
   "number_sections": true,
   "sideBar": true,
   "skip_h1_title": false,
   "title_cell": "Table of Contents",
   "title_sidebar": "Contents",
   "toc_cell": false,
   "toc_position": {},
   "toc_section_display": true,
   "toc_window_display": false
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
